A Hellinger-based discretization method for numeric attributes in classification learning
نویسنده
چکیده
Many classification algorithms require that training examples contain only discrete values. In order to use these algorithms when some attributes have continuous numeric values, the numeric attributes must be converted into discrete ones. This paper describes a new way of discretizing numeric values using information theory. Our method is context-sensitive in the sense that it takes into account the value of the target attribute. The amount of information each interval gives to the target attribute is measured using Hellinger divergence, and the interval boundaries are decided so that each interval contains as equal amount of information as possible. In order to compare our discretization method with some current discretization methods, several popular classification data sets are selected for discretization. We use naive Bayesian classifier and C4.5 as classification tools to compare the accuracy of our discretization method with that of other methods. 2006 Elsevier B.V. All rights reserved.
منابع مشابه
Discretizing Continuous Attributes Using Information Theory
Many classification algorithms require that training examples contain only discrete values. In order to use these algorithms when some attributes have continuous numeric values, the numeric attributes must be converted into discrete ones. This paper describes a new way of discretizing numeric values using information theory. The amount of information each interval gives to the target attribute ...
متن کاملAn Iterative Improvement Approach for the Discretization of Numeric Attributes in Bayesian Classifiers
The Bayesian classifier is a simple approach to classification that produces results that are easy for people to interpret. In many cases, the Bayesian classifier is at least as accurate as much more sophisticated learning algorithms that produce results that are more difficult for people to interpret. To use numeric attributes with Bayesian classifier often requires the attribute values to be ...
متن کاملModel Trees for Hybrid Data Type Classification
In the task of classification, most learning methods are suitable only for certain data types. For the hybrid dataset consists of nominal and numeric attributes, to apply the learning algorithms, some attributes must be transformed into the appropriate types. This procedure could damage the nature of dataset. We propose a model tree approach to integrate several characteristically different lea...
متن کاملCost Sensitive Discretization of Numeric Attributes
Many algorithms in decision tree learning have not been designed to handle numerically-valued attributes very well. Therefore, discretization of the continuous feature space has to be carried out. In this article we introduce the concept of cost-sensitive discretization as a preprocessing step to induction of a classifier and as an elaboration of the error-based discretization method to obtain ...
متن کاملA Local and Global Discretization Method
Most machine learning and data mining algorithms require that the training data contain only discrete attributes, which makes it necessary to discretize continuous numeric attributes. Bottom-up discretization algorithms are well-known methods. They mainly focus on discretizing data based on either local or global independence measure. In this paper, we present a novel bottom-up discretization m...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Knowl.-Based Syst.
دوره 20 شماره
صفحات -
تاریخ انتشار 2007